E Machine Learning Algorithm Cheat Sheet 


This cheat sheet helps you choose the best machine learning algorithm for your predictive analytics solution. 
Your decision is driven by both the nature of your data and the goal you want to achieve with your data. 


EE Microsoft Azure 


= 


Predict between 
several categories 


Answers complex questions with 
multiple possible answers 
Answers questions like: Is this A or B or C or D? 


E Extract information from text 


Derives high-quality information from text 


Multiclass Logistic 


Answers questions like: What info is in this text? Regression 


Fast training times, 
linear model 


Latent Dirichlet 
Allocation 


Extract N-Gram 
Features from Text 


Feature Hashing 


Preprocess Text 


Word2Vector 


Unsupervised topic modeling, 
group texts that are similar 


Creates a dictionary of n-grams 
from a column of free text 


Converts text data to integer 
<— encoded features using the 
Vowpal Wabbit library 


Performs cleaning operations on text, 
<— like removal of stop-words, case 
normalization 


Converts words to values for use in 

<— NLP tasks, like recommender, named 
entity recognition, machine 
translation 


Multiclass Neural 


<— Accuracy, long training times 
Network y. g g 


What do you want to do? 


on oea Accuracy, fast training times 


Forest 
Predict between : ; 
two categories SIE eal eee be nes em ne. 
Multiclass two-class classifier 
Generate recommendations e PETE Depends on binary classifier, 


<— less sensitive to an imbalanced 
dataset with larger complexity 


ļ Multiclass 


Recommenders 


Predicts what someone will be interested in 
Answers the question: What will they be interested in? 


Non-parametric, fast 
training times and scalable 


Multiclass Boosted 
Decision Tree 


Two-Class Classification 


Hybrid recommender, both collaborative 
filtering and content-based approach 


Use the Train Wide & Deep 


Answers simple two-choice questions, 
Recommender module 


like yes or no, true or false 
Answers questions like: Is this A or B? 


Collaborative filtering, better performance 


SUD H a with lower cost by reducing dimensionality 


Two-Class Support 
Vector Machine 


Predict 
values 


Regression 


Makes forecasts by estimating the 
relationship between values 
Answers questions like: How much or how many? 


Discover structure 


sen 


Separates similar data points into intuitive groups 
Answers questions like: How is this organized? 


Perceptron 


Two-Class Decision 
Forest 
Fast Forest Quantile 


z < Predicts a distribution 
Regression 


Two-Class Logistic 
Regression 


Poisson Regression <— Predicts event counts Two-Class Boosted 


Decision Tree 


Classify 


<— Unsupervised learning ` 
Images 


K-Means 


Linear Regression <— Fast training, linear model 


Two-Class Neural 


Enni Find unusual occurrences Network 
ade sc <— Linear model, small data sets 


Regression | 


Decision Forest Anomaly Detection 


Regression 
Identifies and predicts rare or unusual data points 
Answers the question: Is this weird? 


< Accurate, fast training times 


Neural Network 


4 <— Accurate, long training times 
Regression 


ResNet 


Under 100 features, 
aggressive boundary 


PCA-Based Anomaly 
Detection 


Boosted Decision 
Tree Regression 


Accurate, fast training times ie : 
' , F 
large memory footprint One Class SVM <— Fast training times DenseNet 
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Two-Class Averaged 


Image Classification 


Under 100 features, 
linear model 


<— Fast training, linear model 


<— Accurate, fast training 


<— Fast training, linear model 


Accurate, fast training, 
large memory footprint 


Accurate, long training 
times 


Classifies images with popular networks 
Answers questions like: What does this image represent? 


<— Modern deep 
learning neural 
<— network 


